Current Issue : April - June Volume : 2017 Issue Number : 2 Articles : 5 Articles
Background: More than fifty percent of neuroblastoma (NB) patients with adverse prognosis do not benefit from\ntreatment making the identification of new potential targets mandatory. Hypoxia is a condition of low oxygen\ntension, occurring in poorly vascularized tissues, which activates specific genes and contributes to the acquisition of\nthe tumor aggressive phenotype. We defined a gene expression signature (NB-hypo), which measures the hypoxic\nstatus of the neuroblastoma tumor. We aimed at developing a classifier predicting neuroblastoma patientsââ?¬â?¢\noutcome based on the assessment of the adverse effects of tumor hypoxia on the progression of the disease.\nMethods: Multi-layer perceptron (MLP) was trained on the expression values of the 62 probe sets constituting\nNB-hypo signature to develop a predictive model for neuroblastoma patientsââ?¬â?¢ outcome. We utilized the expression\ndata of 100 tumors in a leave-one-out analysis to select and construct the classifier and the expression data of the\nremaining 82 tumors to test the classifier performance in an external dataset. We utilized the Gene set enrichment\nanalysis (GSEA) to evaluate the enrichment of hypoxia related gene sets in patients predicted with ââ?¬Å?Poorââ?¬Â or ââ?¬Å?Goodââ?¬Â\noutcome.\nResults: We utilized the expression of the 62 probe sets of the NB-Hypo signature in 182 neuroblastoma tumors to\ndevelop a MLP classifier predicting patientsââ?¬â?¢ outcome (NB-hypo classifier). We trained and validated the classifier in\na leave-one-out cross-validation analysis on 100 tumor gene expression profiles. We externally tested the resulting\nNB-hypo classifier on an independent 82 tumorsââ?¬â?¢ set. The NB-hypo classifier predicted the patientsââ?¬â?¢ outcome with\nthe remarkable accuracy of 87 %. NB-hypo classifier prediction resulted in 2 % classification error when applied to\nclinically defined low-intermediate risk neuroblastoma patients. The prediction was 100 % accurate in assessing the\ndeath of five low/inter mediated risk patients. GSEA of tumor gene expression profile demonstrated the hypoxic\nstatus of the tumor in patients with poor prognosis.\nConclusions: We developed a robust classifier predicting neuroblastoma patientsââ?¬â?¢ outcome with a very low error\nrate and we provided independent evidence that the poor outcome patients had hypoxic tumors, supporting the\npotential of using hypoxia as target for neuroblastoma treatment....
Identifying the residues in a protein that are involved in protein-protein interaction and identifying the contact\nmatrix for a pair of interacting proteins are two computational tasks at different levels of an in-depth analysis of\nprotein-protein interaction. Various methods for solving these two problems have been reported in the literature.\nHowever, the interacting residue prediction and contact matrix prediction were handled by and large independently in\nthose existing methods, though intuitively good prediction of interacting residues will help with predicting the contact\nmatrix. In this work, we developed a novel protein interacting residue prediction system, contact matrix-interaction\nprofile hidden Markov model (CM-ipHMM), with the integration of contact matrix prediction and the ipHMM\ninteraction residue prediction. We propose to leverage what is learned from the contact matrix prediction and\nutilize the predicted contact matrix as ââ?¬Å?feedbackââ?¬Â to enhance the interaction residue prediction. The CM-ipHMM\nmodel showed significant improvement over the previous method that uses the ipHMM for predicting interaction\nresidues only. It indicates that the downstream contact matrix prediction could help the interaction site prediction....
Background: In the biological experiments of soybean species, molecular markers are widely used to verify the\nsoybean genome or construct its genetic map. Among a variety of molecular markers, insertions and deletions\n(InDels) are preferred with the advantages of wide distribution and high density at the whole-genome level. Hence,\nthe problem of detecting InDels based on next-generation sequencing data is of great importance for the design\nof InDel markers. To tackle it, this paper integrated machine learning techniques with existing software and\ndeveloped two algorithms for InDel detection, one is the best F-score method (BF-M) and the other is the Support\nVector Machine (SVM) method (SVM-M), which is based on the classical SVM model.\nResults: The experimental results show that the performance of BF-M was promising as indicated by the high precision\nand recall scores, whereas SVM-M yielded the best performance in terms of recall and F-score. Moreover, based on the\nInDel markers detected by SVM-M from soybeans that were collected from 56 different regions, highly polymorphic loci\nwere selected to construct an InDel marker database for soybean.\nConclusions: Compared to existing software tools, the two algorithms proposed in this work produced substantially\nhigher precision and recall scores, and remained stable in various types of genomic regions. Moreover, based on SVM-M,\nwe have constructed a database for soybean InDel markers and published it for academic research....
Background: Identifying molecular signatures of disease phenotypes is studied using two mainstream approaches:\n(i) Predictive modeling methods such as linear classification and regression algorithms are used to find signatures\npredictive of phenotypes from genomic data, which may not be robust due to limited sample size or highly correlated\nnature of genomic data. (ii) Gene set analysis methods are used to find gene sets on which phenotypes are linearly\ndependent by bringing prior biological knowledge into the analysis, which may not capture more complex nonlinear\ndependencies. Thus, formulating an integrated model of gene set analysis and nonlinear predictive modeling is of\ngreat practical importance.\nResults: In this study, we propose a Bayesian binary classification framework to integrate gene set analysis and\nnonlinear predictive modeling. We then generalize this formulation to multitask learning setting to model multiple\nrelated datasets conjointly. Our main novelty is the probabilistic nonlinear formulation that enables us to robustly\ncapture nonlinear dependencies between genomic data and phenotype even with small sample sizes. We\ndemonstrate the performance of our algorithms using repeated random subsampling validation experiments on two\ncancer and two tuberculosis datasets by predicting important disease phenotypes from genome-wide gene\nexpression data.\nConclusions: We are able to obtain comparable or even better predictive performance than a baseline Bayesian\nnonlinear algorithm and to identify sparse sets of relevant genes and gene sets on all datasets. We also show that our\nmultitask learning formulation enables us to further improve the generalization performance and to better\nunderstand biological processes behind disease phenotypes....
Background: Kinase over-expression and activation as a consequence of gene amplification or gene fusion events\nis a well-known mechanism of tumorigenesis. The search for novel rearrangements of kinases or other druggable\ngenes may contribute to understanding the biology of cancerogenesis, as well as lead to the identification of new\ncandidate targets for drug discovery. However this requires the ability to query large datasets to identify rare events\noccurring in very small fractions (1ââ?¬â??3 %) of different tumor subtypes. This task is different from what is normally\ndone by conventional tools that are able to find genes differentially expressed between two experimental conditions.\nResults: We propose a computational method aimed at the automatic identification of genes which are selectively\nover-expressed in a very small fraction of samples within a specific tissue. The method does not require a healthy\ncounterpart or a reference sample for the analysis and can be therefore applied also to transcriptional data generated\nfrom cell lines. In our implementation the tool can use gene-expression data from microarray experiments, as well as\ndata generated by RNASeq technologies.\nConclusions: The method was implemented as a publicly available, user-friendly tool called KAOS (Kinase Automatic\nOutliers Search). The tool enables the automatic execution of iterative searches for the identification of extreme outliers\nand for the graphical visualization of the results. Filters can be applied to select the most significant outliers. The\nperformance of the tool was evaluated using a synthetic dataset and compared to state-of-the-art tools. KAOS\nperforms particularly well in detecting genes that are overexpressed in few samples or when an extreme outlier stands\nout on a high variable expression background.\nTo validate the method on real case studies, we used publicly available tumor cell line microarray data, and we were\nable to identify genes which are known to be overexpressed in specific samples, as well as novel ones....
Loading....